Statistics in Medicine
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
Propensity score adjustment addresses confounding by balancing covariates in subject treatment groups through matching, stratification, or weighting. Diagnostics test the success of adjustment. For example, if the standardized mean difference (SMD) for a relevant covariate exceeds a threshold like 0.1, the covariate is considered imbalanced and the study may be biased. Unfortunately, for studies with small or moderate numbers of subjects, the probability of identifying a study as biased because ...
Show abstract
A standard assumption for causal inference from observational data is that one has measured a sufficiently rich set of covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values. Skepticism about the exchangeability assumption in observational studies is often warranted because it hinges on investigators ability to accurately measure covariates capturing all potential sources of confounding. Realistically, confounding mechanisms can rarely if ev...
Show abstract
The increase in methods focused on various types of survival outcomes has allowed practitioners to analyze data that are difficult or expensive to prospectively observe. Still, there are populations that are challenging to study. For example, obtaining a representative sample of couples attempting to become pregnant is difficult due to the dynamic nature of the population. This has led to an increase in the use of cross-sectional designs yielding backwards recurrent survival outcomes. In this pa...
Show abstract
Randomized controlled trials (RCTs) are considered the gold standard for assessing the causal effect of an exposure on an outcome, but are vulnerable to bias from missing data. When outcomes are missing not at random (MNAR), estimates from complete case analysis (CCA) will be biased. There is no statistical test for distinguishing between outcomes missing at random (MAR) and MNAR, and current strategies rely on comparing dropout proportions and covariate distributions, and using auxiliary inform...
Show abstract
We assess the potential financial impact of future gene therapies by identifying the 109 late-stage gene therapy clinical trials currently underway, estimating the prevalence and incidence of their corresponding diseases, developing novel mathematical models of the increase in quality-adjusted life years for each approved gene therapy, and simulating the launch prices and the expected spending of these therapies over a 15-year time horizon. The results of our simulation suggest that an expected ...
Show abstract
Multivariate datasets with a clustered structure are the natural framework for, e.g., multicentre clinical trials. We propose a number of methods aimed at detecting clusters with outlying correlation coefficients. While the methods can be used in a variety of settings, we focus mainly on their application to central statistical monitoring of clinical trials. In particular, we consider the issue of detecting centers (or other clusters of patients such as regions) with outlying correlation coeffic...
Show abstract
Reference distributions quantify the extremeness of clinical test results, typically relative to those of a healthy population. Intervals of these distributions are used in medical decision-making, but while there is much guidance for constructing them, the statistics of interpreting them for diagnosis have been less explored. Here we work directly in terms of the reference distribution, defining it as the likelihood in a posterior calculation of the probability of disease. We thereby identify a...
Show abstract
Logistic mixed-effects model has been a standard multivariate analysis method for analyzing clustered binary outcome data, e.g., longitudinal studies, clustered randomized trials, and multi-center/regional studies. However, the resultant odds ratio estimator cannot be directly interpreted as an effect measure, and it is only interpreted as an approximation of the risk ratio estimator when the frequency of events is small. In this article, we propose a new statistical analysis method that enables...
Show abstract
Health research using electronic health records (EHR) has gained popularity, but misclassification of EHR-derived disease status and lack of representativeness of the study sample can result in substantial bias in effect estimates and can impact power and type I error. In this paper, we develop new strategies for handling disease status misclassification and selection bias in EHR-based association studies. We first focus on each type of bias separately. For misclassification, we propose three no...
Show abstract
We present our considerations for using multiple imputation to account for missing data in propensity score-weighted analysis with bootstrap percentile confidence interval. We outline the assumptions underlying each of the methods and discuss the methodological and practical implications of our choices and briefly point to alternatives. We made a number of choices a priori for example to use logistic regression-based propensity scores to produce "standardized mortality ratio"-weights and Substan...
Show abstract
In the health and social sciences, two types of mixture model have been widely used by researchers to identify heterogeneous trajectories of participants within a population: latent class growth analysis (LCGA) and the growth mixture model (GMM). Both methods parametrically model trajectories of individuals, and capture latent trajectory classes, by using an expectation-maximization (E-M) algorithm. However, parametric modeling of trajectories using polynomial functions or monotonic spline funct...
Show abstract
There are challenges associated with recruiting children to take part in randomised clinical trials and as a result, compared to adults, in many disease areas we are less certain about which treatments are most safe and effective. This can lead to weaker recommendations about which treatments to prescribe in practice. However, it may be possible to borrow strength from adult evidence to improve our understanding of which treatments work best in children, and many different statistical methods a...
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWUnderstanding treatment effects on health-related outcomes using real-world data requires defining a causal parameter and imposing relevant identification assumptions to translate it into a statistical estimand. Semiparametric methods, like the targeted maximum likelihood estimator (TMLE), have been developed to construct asymptotically linear estimators of these parameters. To further establish the asymptotic efficiency of these estimators, two conditions must be met: 1)...
Show abstract
In real world data (RWD) studies, observed datasets are often subject to left truncation, which can bias estimates of survival parameters. Standard methods can only suitably account for left truncation when survival and entry time are independent. Therefore, in the dependent left truncation setting, it is important to quantify the magnitude and direction of estimator bias to determine whether an analysis provides valid results. We conduct simulation studies of common RWD analytic settings in ord...
Show abstract
Discrete Event Simulation (DES) is a flexible and computationally efficient approach for modeling diverse processes; however, DES remains underutilized in healthcare and medical decision-making due to a lack of reliable and reproducible implementations. We developed an open-source DES framework in R to simulate individual-level state-transition models (iSTMs) in continuous time accounting for treatment effects, time dependence on state residence, and age-dependent mortality. Our DES implementati...
Show abstract
Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables ("auxiliary variables"). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidanc...
Show abstract
Network meta-analysis for survival outcome data often involves several studies only reported dichotomized outcomes (i.e., the numbers of events and sample sizes of individual arms). To avoid the reporting biases via eliminating these studies in the syntesis analyses, Woods et al. (2010; BMC Med Res Methodol 10:54) proposed a Bayesian approach to combine the survival and dichotomized outcome data using hierarchical models. However, the Bayesian methods require complicated computations involving t...
Show abstract
Estimating treatment effects from time-to-event data in observational studies requires careful adjustment for both confounding and informative censoring. While inverse probability of treatment weighting (IPTW) and inverse probability of censoring weighting (IPCW) have been used to address these sources of bias separately, their combined application remains underexplored, especially in high-dimensional, real-world datasets. In this paper, we benchmark IPTW, IPCW, and their combination to estimate...
Show abstract
Network meta-analysis (NMA) is a statistical technique for the comparison of treatment options. The nodes of the network are the competing treatments and edges represent comparisons of treatments in trials. Outcomes of Bayesian NMA include estimates of treatment effects, and the probabilities that each treatment is ranked best, second best and so on. How exactly network geometry affects the accuracy and precision of these outcomes is not fully understood. Here we carry out a simulation study and...
Show abstract
Incomplete information on HLA allele typing is a persistent problem when analyzing the role of Human Leukocyte Antigen (HLA) in transplantation. To refine the predictions possible with partial knowledge of HLA typing, some researchers use HaploStats statistics on the frequencies of haplotypes within specified ethnic/national populations to impute complete HLA allele typing. We evaluated methods that use imputation to predict patient outcomes after organ transplantation, with focus on prediction ...